Add real admission data pipeline and model calibration engine by YichengYang-Ethan · Pull Request #1 · MasterAgentAI/QuantPath

YichengYang-Ethan · 2026-03-15T21:40:52Z

core/admission_data.py: CSV loader with GPA normalization (4/4.3/5/100
scales), background tier classification, internship scoring, and
per-program statistics with feature importance analysis
core/calibrator.py: Calibration engine that computes data-driven GPA
thresholds, predicts outcomes, evaluates model accuracy, and generates
school_ranker overrides
Updated school_ranker to accept calibration overrides for data-driven
reach/target/safety classification
CLI: added 'stats' and 'calibrate' commands
data/admissions/sample.csv: 30 sample records across 11 programs
45 new tests (218 total), all passing; ruff clean

https://claude.ai/code/session_014dkZ9Eq3DPVaUfRTeN2HXp

- core/admission_data.py: CSV loader with GPA normalization (4/4.3/5/100 scales), background tier classification, internship scoring, and per-program statistics with feature importance analysis - core/calibrator.py: Calibration engine that computes data-driven GPA thresholds, predicts outcomes, evaluates model accuracy, and generates school_ranker overrides - Updated school_ranker to accept calibration overrides for data-driven reach/target/safety classification - CLI: added 'stats' and 'calibrate' commands - data/admissions/sample.csv: 30 sample records across 11 programs - 45 new tests (218 total), all passing; ruff clean https://claude.ai/code/session_014dkZ9Eq3DPVaUfRTeN2HXp

Copilot

Pull request overview

This PR introduces a real admissions data pipeline and a calibration engine to derive data-driven thresholds/feature weights from historical outcomes, and integrates those thresholds into the school ranking flow via optional overrides.

Changes:

Added core/admission_data.py to load/normalize admissions CSVs and compute per-program statistics (including feature importance).
Added core/calibrator.py to compute calibrated program thresholds, evaluate prediction accuracy, and generate school_ranker override dictionaries.
Extended CLI and school_ranker to consume/show calibration outputs; added sample/template CSVs and new tests for the new modules.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
core/admission_data.py	CSV loader + GPA normalization + background tiering + internship scoring + stats/feature importance.
core/calibrator.py	Calibration engine (thresholds, accuracy evaluation, recommendations) + ranker override generation.
core/school_ranker.py	Adds optional `calibration_overrides` and override-aware reach/target/safety classification.
cli/main.py	Adds `stats` and `calibrate` CLI commands to summarize data and run calibration.
data/admissions/template.csv	Adds admissions CSV header template.
data/admissions/sample.csv	Adds sample admissions dataset for demonstration/testing.
tests/test_admission_data.py	New unit tests for CSV loading, normalization, scoring, and stats computation.
tests/test_calibrator.py	New unit tests for calibration, prediction, and override generation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

core/admission_data.py

+    if scale == 4:
+        return min(4.0, gpa)
+
+    breakpoints = _GPA_SCALE_TO_4.get(scale)
+    if breakpoints is None:
+        # Unknown scale — attempt linear conversion
+        return min(4.0, gpa * 4.0 / scale)
+
+    for threshold, mapped_lo, mapped_hi in breakpoints:
+        if gpa >= threshold:
+            # Find the top of this segment
+            # For the highest segment, cap at max GPA
+            seg_top = scale if breakpoints[0] == (threshold, mapped_lo, mapped_hi) else threshold
+            # Use previous segment's threshold as the top
+            idx = breakpoints.index((threshold, mapped_lo, mapped_hi))
+            if idx == 0:
+                seg_top = scale
+            else:
+                seg_top = breakpoints[idx - 1][0]
+
+            if seg_top == threshold:
+                return mapped_hi
+
+            frac = (gpa - threshold) / (seg_top - threshold)
+            return mapped_lo + frac * (mapped_hi - mapped_lo)
+
+    return 0.0
+
+


core/admission_data.py

+    # Count internships (Chinese: 段)
+    for char in "段":
+        count = desc.count(char)
+        if count > 0:
+            # Extract number before 段
+            for i, c in enumerate(desc):
+                if c == "段":
+                    if i > 0 and desc[i - 1].isdigit():
+                        n = int(desc[i - 1])
+                        score += min(n * 1.5, 5.0)
+                        break
+


core/calibrator.py

+        for feat in feature_sums
+    }
+    total = sum(raw.values()) or 1.0
+    return {feat: round(val / total, 3) for feat, val in sorted(raw.items(), key=lambda x: -x[1])}


core/calibrator.py

+    threshold = ProgramThreshold(
+        program_id=stats.program_id,
+        sample_size=stats.total_records,
+        confidence=_confidence_level(stats.total_records),
+        observed_acceptance_rate=stats.observed_acceptance_rate,
+    )
+
+    if accepted:
+        # GPA floor: minimum GPA among accepted applicants
+        gpas_accepted = [r.gpa_normalized for r in accepted]
+        threshold.gpa_floor = min(gpas_accepted)
+        threshold.gpa_target = sum(gpas_accepted) / len(gpas_accepted)
+        # Safe threshold: 90th percentile of accepted
+        sorted_gpas = sorted(gpas_accepted)
+        p90_idx = int(len(sorted_gpas) * 0.9)
+        threshold.gpa_safe = sorted_gpas[min(p90_idx, len(sorted_gpas) - 1)]
+
+        # Background tier
+        threshold.max_bg_tier_accepted = max(r.bg_tier for r in accepted)
+
+        # Intern score
+        intern_scores = [r.intern_score for r in accepted]
+        threshold.min_intern_score_accepted = min(intern_scores)
+


core/school_ranker.py

+            result_entry["calibrated"] = True
+            result_entry["confidence"] = prog_overrides.get("confidence", "low")
+            result_entry["sample_size"] = prog_overrides.get("sample_size", 0)
+


tests/test_admission_data.py

+from __future__ import annotations
+
+import csv
+import tempfile


core/admission_data.py

+    # Quality keywords (Chinese + English)
+    quality_keywords = {
+        "顶级": 2.0, "top": 1.5, "百亿": 1.5, "头部": 1.5,
+        "一线": 1.0, "知名": 0.8, "大型": 0.5,
+    }
+    for kw, pts in quality_keywords.items():
+        if kw in desc:
+            score += pts
+
+    # Type keywords
+    type_keywords = {
+        "量化": 1.5, "quant": 1.5, "投行": 1.5, "ib": 1.0,
+        "对冲": 1.5, "hedge": 1.5, "私募": 1.0, "qr": 1.0,
+        "trading": 1.0, "研究": 0.8, "金工": 0.8,
+        "三中一华": 2.0, "高盛": 2.0, "goldman": 2.0,
+        "摩根": 2.0, "morgan": 1.5, "kaggle": 1.5,
+    }


core/school_ranker.py

+        # Classification (with optional data-driven overrides).
+        prog_overrides = overrides.get(prog.id)
        category = _classify(
            user_gpa=profile.gpa,
            program_avg_gpa=prog.avg_gpa,
            acceptance_rate=prog.acceptance_rate,
+            overrides=prog_overrides,


cli/main.py

+            console.print(Panel("Ranker Overrides (Applied)", border_style="green"))
+            for pid, ov in sorted(overrides.items()):
+                console.print(
+                    f"  {pid}: reach<{ov['reach_gpa_threshold']:.2f} "
+                    f"safe>={ov['safety_gpa_threshold']:.2f} "
+                    f"[dim](n={ov['sample_size']}, {ov['confidence']})[/dim]"
+                )


tests/test_calibrator.py

+from core.admission_data import AdmissionRecord
+from core.calibrator import (
+    CalibrationResult,
+    ProgramThreshold,
+    calibrate_all,
+    calibrate_program,
+    generate_ranker_overrides,
+    predict_outcome,
+)
+from core.admission_data import compute_program_stats


YichengYang-Ethan requested a review from Copilot March 15, 2026 21:41

Copilot started reviewing on behalf of YichengYang-Ethan March 15, 2026 21:41 View session

YichengYang-Ethan merged commit 9c534e3 into main Mar 15, 2026
5 checks passed

Copilot AI reviewed Mar 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add real admission data pipeline and model calibration engine#1

Add real admission data pipeline and model calibration engine#1
YichengYang-Ethan merged 1 commit intomainfrom
claude/analyze-project-uGJQp

YichengYang-Ethan commented Mar 15, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

YichengYang-Ethan commented Mar 15, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants